Reliability and Fault Tolerance in Trust

نویسنده

  • Jeffrey M. Voas
چکیده

The ubiquity of information systems has made correct and reliable operation of critical systems indispensable. The trustworthiness of digital systems is increasingly dependent on the trustworthiness of the software. While hardware trustworthiness is by no means a solved problem, systemwide problems are increasingly blamed on poorly tested, defective software. System trustworthiness is therefore a combination of several key software attributes: reliability, safety, security, availability, performance, fault-tolerance, and privacy. Some of these attributes can be directly measured, some cannot. For example performance and availability can be numerically measured; safety and security cannot. Further, several of these attributes may conflict, such as security and performance. Therefore to demonstrate that the software of a system can be trusted, it requires a combination of qualitative arguments concerning the level achieved for some attributes in combination with the numerical (quantitative) scores measured for others. In order to understand the trustworthiness and security of a software system, we first need to understand its reliability and fault tolerance. Therefore I will propose here the need for new thinking in these two areas. To begin, software reliability theory is one of industry’s seminal approaches for predicting the likelihood of software field failures. Unfortunately, the assumptions that software reliability measurement models make do not address the complexities of most software, resulting in far less adoption of theory into practice than is possible. Even though reliability models are quantitative, the industry only uses these results qualitatively. An example of this use is testing. When the mean time to failure (MTTF) falls below X according to a particular reliability model, testing stops, but developers and users cannot assume that the software will always behave with an MTTF less than X in the field. The rationale for researching a new theory of software reliability is clear: current software reliability theories do not scale to the types of large-scale heterogeneous systems that are being fielded [1],[3]. Existing software reliability theories work more accurately in telecommunications and aerospace because telecom and aerospace software works in embedded environments, and in many ways is indistinguishable from the hardware on which it resides. In other disciplines, such as the production of commercial, shrinkwrap software, quality has historically been an add-on, of lesser market value than feature richness or short release cycles. Further, there are dozens of underlying variables concerning different hardware and computer configurations that are not included in the traditional definition of the application’s operational profile that cause the same application to have numerous different reliabilities depending on its environment. These variables are not properly included in the current definition of operational profile. Therefore new research should attempt to create a software reliability theory for all software, particularly COTS software and “componentware.” It should also examine the use of time in existing models, and look at alternatives parameters such as fault detecting ability offered by the operational profile and the complexity of the code. And it should also examine how the current definition of an operational profile can be expanded to more accurately define the input space of non-embedded software. Other benefits of such research include the potential to create a composability theory for component-based systems, by reducing the current problems of incompatible assumptions between interconnected components. Next, I will argue that true fault tolerance cannot be achieved without specialized testing that involves fault insertion techniques at the interface level. We are proposing new thinking that investigates the adoption of methodologies to assess the interoperability of components [2], without assuming that the code within the components is accessible. Such thinking contains two key components: (1) the ability to corrupt internal states during test execution, and (2) the required assertions to determine the impact of the corruptions as well as the rate of

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fault Tolerant and DistributedBroadcast Encryption

Reliability and trust are the main concerns of this paper Improves reliability by looking at other schemes and extending them Objective to gain fault tolerance and to remove the need for trust in the broadcaster

متن کامل

Stability Assessment Metamorphic Approach (SAMA) for Effective Scheduling based on Fault Tolerance in Computational Grid

Grid Computing allows coordinated and controlled resource sharing and problem solving in multi-institutional, dynamic virtual organizations. Moreover, fault tolerance and task scheduling is an important issue for large scale computational grid because of its unreliable nature of grid resources. Commonly exploited techniques to realize fault tolerance is periodic Checkpointing that periodically ...

متن کامل

Reliability and Performance Evaluation of Fault-aware Routing Methods for Network-on-Chip Architectures (RESEARCH NOTE)

Nowadays, faults and failures are increasing especially in complex systems such as Network-on-Chip (NoC) based Systems-on-a-Chip due to the increasing susceptibility and decreasing feature sizes. On the other hand, fault-tolerant routing algorithms have an evident effect on tolerating permanent faults and improving the reliability of a Network-on-Chip based system. This paper presents reliabili...

متن کامل

A Microprocessor-Based Hybrid Duplex Fault-Tolerant System

Reliability is one of the fundamental considerations in the design of industrial control equipment. The microprocessor-based Hybrid Duplex fault-tolerant System (HDS) proposed in this paper has high reliability to meet this demand although its hardware structure is simple. The hardware configuration of HDS and the fault tolerance of this system are described. The switching control strategies in...

متن کامل

Towards Attack-Tolerant Networks: Concurrent Multipath Routing and the Butterfly Network

Targeted attacks against network infrastructure are notoriously difficult to guard against. In the case of communication networks, such attacks can leave users vulnerable to censorship and surveillance, even when cryptography is used. Much of the existing work on network fault-tolerance focuses on random faults and does not apply to adversarial faults (attacks). Centralized networks have single...

متن کامل

ارائه یک رویکرد همانند سازی شده عامل محور در اجرای یک الگوی کد متحرک مطمئن

Abstract Using mobile agents, it is possible to bring the code close to the resources, which is not foreseen by the traditional client/server paradigm. Compared to the client/server computing paradigm, the greater flexibility of the mobile agent paradigm comes at additional costs as well as the additional complexity of developing and managing mobile agent-based applications. Such complexity ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006